Overview

Brought to you by YData

Dataset statistics

Number of variables 10
Number of observations 67028
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 5.1 MiB
Average record size in memory 80.0 B

Variable types

Text 7
Numeric 2
Categorical 1

Alerts

meddra_concept_id is highly overall correlated with soc High correlation
soc is highly overall correlated with meddra_concept_id High correlation

Reproduction

Analysis started 2025-04-28 13:40:24.254427
Analysis finished 2025-04-28 13:40:27.898596
Duration 3.64 seconds
Software version ydata-profiling vv4.16.1
Download configuration config.json

Variables

ade
Text

Distinct 42256
Distinct (%) 63.0%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:28.125841 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 17
Median length 17
Mean length 17
Min length 17

Characters and Unicode

Total characters 1139476
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 25072 ?
Unique (%) 37.4%

Sample

1st row 21600013_35707686
2nd row 21600013_35707686
3rd row 21600013_35809243
4th row 21600013_35809243
5th row 21600013_36009711
Value Count Frequency (%)
21602669_35205025 15
 
< 0.1%
21604417_35305832 10
 
< 0.1%
21600448_35205236 9
 
< 0.1%
21604711_35205025 9
 
< 0.1%
21602141_36009735 9
 
< 0.1%
21601669_35205025 9
 
< 0.1%
21604712_35205025 9
 
< 0.1%
21603967_35205025 9
 
< 0.1%
21601751_35205025 9
 
< 0.1%
21604655_35205025 9
 
< 0.1%
Other values (42246) 66931
99.9%
2025-04-28T20:40:28.567771 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 163595
14.4%
1 155431
13.6%
6 139518
12.2%
3 132284
11.6%
2 131246
11.5%
5 85116
7.5%
4 75352
6.6%
7 72139
6.3%
_ 67028
5.9%
8 60592
 
5.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 1139476
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 163595
14.4%
1 155431
13.6%
6 139518
12.2%
3 132284
11.6%
2 131246
11.5%
5 85116
7.5%
4 75352
6.6%
7 72139
6.3%
_ 67028
5.9%
8 60592
 
5.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1139476
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 163595
14.4%
1 155431
13.6%
6 139518
12.2%
3 132284
11.6%
2 131246
11.5%
5 85116
7.5%
4 75352
6.6%
7 72139
6.3%
_ 67028
5.9%
8 60592
 
5.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1139476
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 163595
14.4%
1 155431
13.6%
6 139518
12.2%
3 132284
11.6%
2 131246
11.5%
5 85116
7.5%
4 75352
6.6%
7 72139
6.3%
_ 67028
5.9%
8 60592
 
5.3%

stitch_id
Text

Distinct 707
Distinct (%) 1.1%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:28.849664 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 12
Median length 12
Mean length 12
Min length 12

Characters and Unicode

Total characters 804336
Distinct characters 13
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 17 ?
Unique (%) < 0.1%

Sample

1st row CID100002713
2nd row CID100002713
3rd row CID100002713
4th row CID100002713
5th row CID100002713
Value Count Frequency (%)
cid100002771 1575
 
2.3%
cid100003032 1011
 
1.5%
cid100060795 937
 
1.4%
cid100005073 787
 
1.2%
cid100003345 782
 
1.2%
cid100004158 706
 
1.1%
cid100004594 634
 
0.9%
cid100004583 625
 
0.9%
cid100005514 576
 
0.9%
cid100003386 566
 
0.8%
Other values (697) 58829
87.8%
2025-04-28T20:40:29.283983 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%
Distinct 1925
Distinct (%) 2.9%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:29.597712 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 54
Median length 41
Mean length 13.666408
Min length 3

Characters and Unicode

Total characters 916032
Distinct characters 58
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 277 ?
Unique (%) 0.4%

Sample

1st row Gingival bleeding
2nd row Gingival bleeding
3rd row Pain
4th row Pain
5th row Hypersensitivity
Value Count Frequency (%)
pain 3187
 
3.0%
disorder 2365
 
2.3%
dizziness 2166
 
2.1%
increased 1789
 
1.7%
oedema 1604
 
1.5%
infection 1381
 
1.3%
decreased 1343
 
1.3%
syndrome 1286
 
1.2%
chest 1067
 
1.0%
abdominal 980
 
0.9%
Other values (1672) 87385
83.6%
2025-04-28T20:40:30.062844 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
i 91364
 
10.0%
e 88251
 
9.6%
a 80855
 
8.8%
s 63448
 
6.9%
n 61938
 
6.8%
r 60021
 
6.6%
o 57344
 
6.3%
t 56153
 
6.1%
37525
 
4.1%
l 32738
 
3.6%
Other values (48) 286395
31.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 916032
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 91364
 
10.0%
e 88251
 
9.6%
a 80855
 
8.8%
s 63448
 
6.9%
n 61938
 
6.8%
r 60021
 
6.6%
o 57344
 
6.3%
t 56153
 
6.1%
37525
 
4.1%
l 32738
 
3.6%
Other values (48) 286395
31.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 916032
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 91364
 
10.0%
e 88251
 
9.6%
a 80855
 
8.8%
s 63448
 
6.9%
n 61938
 
6.8%
r 60021
 
6.6%
o 57344
 
6.3%
t 56153
 
6.1%
37525
 
4.1%
l 32738
 
3.6%
Other values (48) 286395
31.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 916032
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 91364
 
10.0%
e 88251
 
9.6%
a 80855
 
8.8%
s 63448
 
6.9%
n 61938
 
6.8%
r 60021
 
6.6%
o 57344
 
6.3%
t 56153
 
6.1%
37525
 
4.1%
l 32738
 
3.6%
Other values (48) 286395
31.3%

medgen_id
Text

Distinct 2752
Distinct (%) 4.1%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:30.400258 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 8
Median length 8
Mean length 8
Min length 8

Characters and Unicode

Total characters 536224
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 519 ?
Unique (%) 0.8%

Sample

1st row C0017565
2nd row C0017565
3rd row C0030193
4th row C0234238
5th row C0020517
Value Count Frequency (%)
c0039070 1581
 
2.4%
c0012833 1149
 
1.7%
c0038325 690
 
1.0%
c0013404 682
 
1.0%
c0231218 675
 
1.0%
c0015672 657
 
1.0%
c0008031 642
 
1.0%
c0009676 642
 
1.0%
c0042109 618
 
0.9%
c0015230 577
 
0.9%
Other values (2742) 59115
88.2%
2025-04-28T20:40:30.887215 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

meddra_concept_id
Real number (ℝ)

High correlation 

Distinct 1925
Distinct (%) 2.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 36142362
Minimum 35104070
Maximum 43562937
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:31.076488 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 35104070
5-th percentile 35204928
Q1 35707989
median 36009754
Q3 36718272
95-th percentile 37320158
Maximum 43562937
Range 8458867
Interquartile range (IQR) 1010283

Descriptive statistics

Standard deviation 783353.51
Coefficient of variation (CV) 0.021674109
Kurtosis 20.992349
Mean 36142362
Median Absolute Deviation (MAD) 507025
Skewness 2.648735
Sum 2.4225502 × 1012
Variance 6.1364272 × 1011
Monotonicity Not monotonic
2025-04-28T20:40:31.242729 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
35205025 2139
 
3.2%
35205185 864
 
1.3%
36416514 848
 
1.3%
36718321 820
 
1.2%
36718347 738
 
1.1%
35205038 700
 
1.0%
36009783 646
 
1.0%
35809072 614
 
0.9%
35809134 598
 
0.9%
35205037 537
 
0.8%
Other values (1915) 58524
87.3%
Value Count Frequency (%)
35104070 10
 
< 0.1%
35104074 256
0.4%
35104075 2
 
< 0.1%
35104076 6
 
< 0.1%
35104088 2
 
< 0.1%
35104091 2
 
< 0.1%
35104093 2
 
< 0.1%
35104094 1
 
< 0.1%
35104100 8
 
< 0.1%
35104101 40
 
0.1%
Value Count Frequency (%)
43562937 4
 
< 0.1%
43562936 68
0.1%
43562866 1
 
< 0.1%
43562844 4
 
< 0.1%
43562827 10
 
< 0.1%
43562706 6
 
< 0.1%
43053920 3
 
< 0.1%
43053913 2
 
< 0.1%
43053882 2
 
< 0.1%
43053854 12
 
< 0.1%

soc
Categorical

High correlation 

Distinct 27
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
Nervous system disorders
8330 
General disorders and administration site conditions
6294 
Gastrointestinal disorders
6027 
Psychiatric disorders
5240 
Skin and subcutaneous tissue disorders
5143 
Other values (22)
35994 

Length

Max length 67
Median length 40
Mean length 30.509444
Min length 13

Characters and Unicode

Total characters 2044987
Distinct characters 38
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Gastrointestinal disorders
2nd row Vascular disorders
3rd row General disorders and administration site conditions
4th row General disorders and administration site conditions
5th row Immune system disorders

Common Values

Value Count Frequency (%)
Nervous system disorders 8330
12.4%
General disorders and administration site conditions 6294
 
9.4%
Gastrointestinal disorders 6027
 
9.0%
Psychiatric disorders 5240
 
7.8%
Skin and subcutaneous tissue disorders 5143
 
7.7%
Vascular disorders 4369
 
6.5%
Respiratory, thoracic and mediastinal disorders 4043
 
6.0%
Cardiac disorders 3821
 
5.7%
Infections and infestations 2970
 
4.4%
Musculoskeletal and connective tissue disorders 2855
 
4.3%
Other values (17) 17936
26.8%

Length

2025-04-28T20:40:31.406857 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
disorders 59985
25.7%
and 31627
 
13.6%
system 13925
 
6.0%
nervous 8330
 
3.6%
tissue 7998
 
3.4%
conditions 6373
 
2.7%
general 6294
 
2.7%
site 6294
 
2.7%
administration 6294
 
2.7%
gastrointestinal 6027
 
2.6%
Other values (51) 79903
34.3%

Most occurring characters

Value Count Frequency (%)
s 258795
12.7%
i 193905
9.5%
r 187956
9.2%
d 177847
8.7%
166022
8.1%
e 164351
8.0%
o 140881
 
6.9%
n 132856
 
6.5%
a 130134
 
6.4%
t 123216
 
6.0%
Other values (28) 369024
18.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 2044987
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
s 258795
12.7%
i 193905
9.5%
r 187956
9.2%
d 177847
8.7%
166022
8.1%
e 164351
8.0%
o 140881
 
6.9%
n 132856
 
6.5%
a 130134
 
6.4%
t 123216
 
6.0%
Other values (28) 369024
18.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 2044987
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
s 258795
12.7%
i 193905
9.5%
r 187956
9.2%
d 177847
8.7%
166022
8.1%
e 164351
8.0%
o 140881
 
6.9%
n 132856
 
6.5%
a 130134
 
6.4%
t 123216
 
6.0%
Other values (28) 369024
18.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 2044987
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
s 258795
12.7%
i 193905
9.5%
r 187956
9.2%
d 177847
8.7%
166022
8.1%
e 164351
8.0%
o 140881
 
6.9%
n 132856
 
6.5%
a 130134
 
6.4%
t 123216
 
6.0%
Other values (28) 369024
18.0%
Distinct 747
Distinct (%) 1.1%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:31.734126 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 7
Median length 7
Mean length 7
Min length 7

Characters and Unicode

Total characters 469196
Distinct characters 28
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 17 ?
Unique (%) < 0.1%

Sample

1st row A01AB03
2nd row A01AB03
3rd row A01AB03
4th row A01AB03
5th row A01AB03
Value Count Frequency (%)
n05ax12 937
 
1.4%
n06ab10 800
 
1.2%
n05ax08 787
 
1.2%
n06ab04 775
 
1.2%
n03ax11 576
 
0.9%
n06ab03 566
 
0.8%
n03ax16 521
 
0.8%
n06ab05 517
 
0.8%
j01ma02 516
 
0.8%
n02ab03 508
 
0.8%
Other values (737) 60525
90.3%
2025-04-28T20:40:32.236771 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 123312
26.3%
A 63212
13.5%
1 47130
 
10.0%
B 26444
 
5.6%
2 25203
 
5.4%
N 22991
 
4.9%
3 19955
 
4.3%
C 18835
 
4.0%
X 14907
 
3.2%
5 14871
 
3.2%
Other values (18) 92336
19.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 469196
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 123312
26.3%
A 63212
13.5%
1 47130
 
10.0%
B 26444
 
5.6%
2 25203
 
5.4%
N 22991
 
4.9%
3 19955
 
4.3%
C 18835
 
4.0%
X 14907
 
3.2%
5 14871
 
3.2%
Other values (18) 92336
19.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 469196
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 123312
26.3%
A 63212
13.5%
1 47130
 
10.0%
B 26444
 
5.6%
2 25203
 
5.4%
N 22991
 
4.9%
3 19955
 
4.3%
C 18835
 
4.0%
X 14907
 
3.2%
5 14871
 
3.2%
Other values (18) 92336
19.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 469196
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 123312
26.3%
A 63212
13.5%
1 47130
 
10.0%
B 26444
 
5.6%
2 25203
 
5.4%
N 22991
 
4.9%
3 19955
 
4.3%
C 18835
 
4.0%
X 14907
 
3.2%
5 14871
 
3.2%
Other values (18) 92336
19.7%

atc_concept_id
Real number (ℝ)

Distinct 747
Distinct (%) 1.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 21603242
Minimum 21600013
Maximum 21605306
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:32.438631 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 21600013
5-th percentile 21600429
Q1 21602504
median 21603719
Q3 21604434
95-th percentile 21604755
Maximum 21605306
Range 5293
Interquartile range (IQR) 1930

Descriptive statistics

Standard deviation 1408.2015
Coefficient of variation (CV) 6.5184728 × 10-5
Kurtosis -0.55819204
Mean 21603242
Median Absolute Deviation (MAD) 843
Skewness -0.7650475
Sum 1.4480221 × 1012
Variance 1983031.4
Monotonicity Increasing
2025-04-28T20:40:32.620559 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
21604562 937
 
1.4%
21604718 800
 
1.2%
21604559 787
 
1.2%
21604712 775
 
1.2%
21604433 576
 
0.9%
21604711 566
 
0.8%
21604438 521
 
0.8%
21604713 517
 
0.8%
21603009 516
 
0.8%
21604272 508
 
0.8%
Other values (737) 60525
90.3%
Value Count Frequency (%)
21600013 19
 
< 0.1%
21600034 92
 
0.1%
21600082 33
 
< 0.1%
21600083 125
 
0.2%
21600084 145
 
0.2%
21600085 8
 
< 0.1%
21600096 341
0.5%
21600097 359
0.5%
21600098 440
0.7%
21600099 103
 
0.2%
Value Count Frequency (%)
21605306 11
 
< 0.1%
21605275 50
0.1%
21605272 2
 
< 0.1%
21605271 48
0.1%
21605270 44
0.1%
21605269 94
0.1%
21605266 4
 
< 0.1%
21605265 14
 
< 0.1%
21605264 7
 
< 0.1%
21605262 58
0.1%
Distinct 720
Distinct (%) 1.1%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:32.890766 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 29
Median length 25
Mean length 11.054992
Min length 5

Characters and Unicode

Total characters 740994
Distinct characters 35
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 17 ?
Unique (%) < 0.1%

Sample

1st row chlorhexidine
2nd row chlorhexidine
3rd row chlorhexidine
4th row chlorhexidine
5th row chlorhexidine
Value Count Frequency (%)
acid 1714
 
2.4%
diclofenac 1011
 
1.4%
aripiprazole 937
 
1.3%
escitalopram 800
 
1.1%
risperidone 787
 
1.1%
fentanyl 782
 
1.1%
citalopram 775
 
1.1%
topiramate 576
 
0.8%
hydrocortisone 567
 
0.8%
fluoxetine 566
 
0.8%
Other values (738) 62106
87.9%
2025-04-28T20:40:33.344330 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
i 80827
10.9%
e 79612
10.7%
a 72825
9.8%
o 69389
 
9.4%
n 61900
 
8.4%
r 48423
 
6.5%
l 46545
 
6.3%
t 40601
 
5.5%
p 33883
 
4.6%
c 31748
 
4.3%
Other values (25) 175241
23.6%

Most occurring categories

Value Count Frequency (%)
(unknown) 740994
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 80827
10.9%
e 79612
10.7%
a 72825
9.8%
o 69389
 
9.4%
n 61900
 
8.4%
r 48423
 
6.5%
l 46545
 
6.3%
t 40601
 
5.5%
p 33883
 
4.6%
c 31748
 
4.3%
Other values (25) 175241
23.6%

Most occurring scripts

Value Count Frequency (%)
(unknown) 740994
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 80827
10.9%
e 79612
10.7%
a 72825
9.8%
o 69389
 
9.4%
n 61900
 
8.4%
r 48423
 
6.5%
l 46545
 
6.3%
t 40601
 
5.5%
p 33883
 
4.6%
c 31748
 
4.3%
Other values (25) 175241
23.6%

Most occurring blocks

Value Count Frequency (%)
(unknown) 740994
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 80827
10.9%
e 79612
10.7%
a 72825
9.8%
o 69389
 
9.4%
n 61900
 
8.4%
r 48423
 
6.5%
l 46545
 
6.3%
t 40601
 
5.5%
p 33883
 
4.6%
c 31748
 
4.3%
Other values (25) 175241
23.6%

ade_name
Text

Distinct 40387
Distinct (%) 60.3%
Missing 0
Missing (%) 0.0%
Memory size 523.8 KiB
2025-04-28T20:40:33.611347 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 75
Median length 61
Mean length 29.7214
Min length 14

Characters and Unicode

Total characters 1992166
Distinct characters 61
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 23065 ?
Unique (%) 34.4%

Sample

1st row chlorhexidine and Gingival bleeding
2nd row chlorhexidine and Gingival bleeding
3rd row chlorhexidine and Pain
4th row chlorhexidine and Pain
5th row chlorhexidine and Hypersensitivity
Value Count Frequency (%)
and 67040
27.7%
pain 3187
 
1.3%
disorder 2365
 
1.0%
dizziness 2166
 
0.9%
increased 1789
 
0.7%
acid 1739
 
0.7%
oedema 1604
 
0.7%
infection 1381
 
0.6%
decreased 1343
 
0.6%
syndrome 1286
 
0.5%
Other values (2409) 158302
65.4%
2025-04-28T20:40:34.068891 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
a 220708
11.1%
n 190866
 
9.6%
175174
 
8.8%
i 172191
 
8.6%
e 167863
 
8.4%
o 126733
 
6.4%
d 117573
 
5.9%
r 108444
 
5.4%
t 96754
 
4.9%
s 84346
 
4.2%
Other values (51) 531514
26.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 1992166
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
a 220708
11.1%
n 190866
 
9.6%
175174
 
8.8%
i 172191
 
8.6%
e 167863
 
8.4%
o 126733
 
6.4%
d 117573
 
5.9%
r 108444
 
5.4%
t 96754
 
4.9%
s 84346
 
4.2%
Other values (51) 531514
26.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1992166
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
a 220708
11.1%
n 190866
 
9.6%
175174
 
8.8%
i 172191
 
8.6%
e 167863
 
8.4%
o 126733
 
6.4%
d 117573
 
5.9%
r 108444
 
5.4%
t 96754
 
4.9%
s 84346
 
4.2%
Other values (51) 531514
26.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1992166
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
a 220708
11.1%
n 190866
 
9.6%
175174
 
8.8%
i 172191
 
8.6%
e 167863
 
8.4%
o 126733
 
6.4%
d 117573
 
5.9%
r 108444
 
5.4%
t 96754
 
4.9%
s 84346
 
4.2%
Other values (51) 531514
26.7%

Interactions

2025-04-28T20:40:27.083066 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:40:26.831564 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:40:27.208015 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:40:26.956233 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-04-28T20:40:34.186722 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
meddra_concept_id atc_concept_id
meddra_concept_id 1.000 0.044
atc_concept_id 0.044 1.000
2025-04-28T20:40:34.307528 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
meddra_concept_id atc_concept_id
meddra_concept_id 1.000 0.069
atc_concept_id 0.069 1.000
2025-04-28T20:40:34.422879 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
meddra_concept_id atc_concept_id
meddra_concept_id 1.000 0.046
atc_concept_id 0.046 1.000
2025-04-28T20:40:34.541004 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
meddra_concept_id soc atc_concept_id
meddra_concept_id 1.000 0.796 0.078
soc 0.796 1.000 0.251
atc_concept_id 0.078 0.251 1.000

Missing values

2025-04-28T20:40:27.396597 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-28T20:40:27.672023 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ade stitch_id meddra_concept_name medgen_id meddra_concept_id soc atc_concept_code atc_concept_id atc_concept_name ade_name
0 21600013_35707686 CID100002713 Gingival bleeding C0017565 35707686 Gastrointestinal disorders A01AB03 21600013 chlorhexidine chlorhexidine and Gingival bleeding
1 21600013_35707686 CID100002713 Gingival bleeding C0017565 35707686 Vascular disorders A01AB03 21600013 chlorhexidine chlorhexidine and Gingival bleeding
2 21600013_35809243 CID100002713 Pain C0030193 35809243 General disorders and administration site conditions A01AB03 21600013 chlorhexidine chlorhexidine and Pain
3 21600013_35809243 CID100002713 Pain C0234238 35809243 General disorders and administration site conditions A01AB03 21600013 chlorhexidine chlorhexidine and Pain
4 21600013_36009711 CID100002713 Hypersensitivity C0020517 36009711 Immune system disorders A01AB03 21600013 chlorhexidine chlorhexidine and Hypersensitivity
5 21600013_36009783 CID100002713 Urticaria C0042109 36009783 Immune system disorders A01AB03 21600013 chlorhexidine chlorhexidine and Urticaria
6 21600013_36009783 CID100002713 Urticaria C0042109 36009783 Skin and subcutaneous tissue disorders A01AB03 21600013 chlorhexidine chlorhexidine and Urticaria
7 21600013_36110708 CID100002713 Sinusitis C0037199 36110708 Infections and infestations A01AB03 21600013 chlorhexidine chlorhexidine and Sinusitis
8 21600013_36110708 CID100002713 Sinusitis C0037199 36110708 Respiratory, thoracic and mediastinal disorders A01AB03 21600013 chlorhexidine chlorhexidine and Sinusitis
9 21600013_36110715 CID100002713 Upper respiratory tract infection C0041912 36110715 Infections and infestations A01AB03 21600013 chlorhexidine chlorhexidine and Upper respiratory tract infection
ade stitch_id meddra_concept_name medgen_id meddra_concept_id soc atc_concept_code atc_concept_id atc_concept_name ade_name
67018 21605306_35809130 CID100032800 Flushing C0016382 35809130 General disorders and administration site conditions V04CC03 21605306 sincalide sincalide and Flushing
67019 21605306_35809130 CID100032800 Flushing C0016382 35809130 Skin and subcutaneous tissue disorders V04CC03 21605306 sincalide sincalide and Flushing
67020 21605306_35809130 CID100032800 Flushing C0016382 35809130 Vascular disorders V04CC03 21605306 sincalide sincalide and Flushing
67021 21605306_35809134 CID100032800 Hyperhidrosis C0038990 35809134 General disorders and administration site conditions V04CC03 21605306 sincalide sincalide and Hyperhidrosis
67022 21605306_35809134 CID100032800 Hyperhidrosis C0038990 35809134 Skin and subcutaneous tissue disorders V04CC03 21605306 sincalide sincalide and Hyperhidrosis
67023 21605306_35809134 CID100032800 Hyperhidrosis C0700590 35809134 General disorders and administration site conditions V04CC03 21605306 sincalide sincalide and Hyperhidrosis
67024 21605306_35809134 CID100032800 Hyperhidrosis C0700590 35809134 Skin and subcutaneous tissue disorders V04CC03 21605306 sincalide sincalide and Hyperhidrosis
67025 21605306_35809243 CID100032800 Pain C0030193 35809243 General disorders and administration site conditions V04CC03 21605306 sincalide sincalide and Pain
67026 21605306_36718317 CID100032800 Loss of consciousness C0039070 36718317 Nervous system disorders V04CC03 21605306 sincalide sincalide and Loss of consciousness
67027 21605306_37320158 CID100032800 Erythema C0041834 37320158 Skin and subcutaneous tissue disorders V04CC03 21605306 sincalide sincalide and Erythema